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1. INTRODUCTION 

For a period of about three years the BBC's Research 
and Development Depaitment has been involved in the 
evaluations of multichannel sound systems, predomi- 
nantly in support of the work of the ISO/IEC MPEG 
Audio group. These have been aimed at providing a 
quality of codec for the broadcasting industry that will 
enable 3/2 surround-sound* broadcasting for such ap- 
plications as digital television, without requiring an 
impossibly high overall bit-rate. This report summa- 
rises the subjective tests that have been conducted to 
date and gives an indication of what further work is re- 
quired or is already in-hand. 



2. AUDIO CODEC TEST METHODOLOGY 

In an earlier document, Kirby et al^ described in some 
detail the test procedures used by MPEG Audio. 
Essentially, these were adopted from the ITU-R Rec- 
ommendation BS-1116,^ which had been specifically 
developed by the ITU-R for the assessment of high- 
quality audio signal processing devices. 

Overall, the Recommendation specifies transducer 
parameters, room acoustic requirements for loud- 
speaker-reproduced sounds and statistical evaluation 
techniques, as well as the basic test methodology. For 
the most critical assessments, this requires the use of at 
least 20 critical listeners, each of who is involved in a 
training session prior to the main tests in order to sensit- 
ise the listener to the type of artefact being assessed. 
The test presentation should be based on the double- 
blind/triple-stimulus/hidden-reference technique,**^ with 
the listener able to control the switching between the 
different stimuli. Thus, the tests are essentially con- 
ducted with one listener at a time. 

The 1994 MPEG tests comphed with all of these as- 
pects of BS-1116, but it was noted that this led to a 
very prolonged sequence of tests, both for the listener 
and for the two test centres (BBC and Deutsche 
Telekom AG). This made the tests rather expensive to 



In multichannel- or surround-sound the nomenclature 3/2 Is used to 
Indicate the disposition of loudspeakers in the reproduction area (or the 
allocation of recording channels during programme production) thus 3/2 
signifies front loudspeaker disposition of left, centre and right and rear 
loudspeaker disposition of left-surround and right-surround. 

In such tests the listener is presented with three stimuli labelled 'Ref, 'A' 
and 'B'. 'Ref is always the reference sound stimulus. 'A' or 'B' Is also the 
reference, whilst the other, 'B' or 'A', Is the coded sound. The allocation 
of the coded sound and the reference to 'A' and 'B' Is randomised and Is 
not known by either the test subject or the person running the test. 



conduct and some simplification was required before 
preliminary evaluations of 'improved' codecs were 
undertaken. 

Two approaches have been tried in an attempt to sim- 
plify the test procedures. One was to use a test panel of 
the most critical listeners and ask them to deliver a 
consensus vote for each listening trial. This worked for 
one group of tests that were carried out but it suffered 
from chai'ges of possible bias in aniving at the consen- 
sus vote. More significantly, from MPEG's point of 
view, it did not allow direct comparison with the re- 
sults of the earlier tests, and only relative votes could 
be considered. 

More usefully, there has been an approach whereby a 
much reduced number of listeners took part, but other- 
wise the tests have been conducted according to 
BS-1116. The listeners used have been some of those 
who were identified (after the 1994 tests), as being the 
most critical and consistent, and for whom individual 
test results, from the previous tests, could be identified. 
This strategy allows direct comparison of an individ- 
ual's scores from each of the test sessions and thus 
gives an indication of likely trends in the quality of 
codec performance. By this means the most time-con- 
suming aspect of the BS-1116 test procedure, i.e. the 
time taken to run the tests themselves, is linearly 
reduced by the number of subjects. However, the 
much-reduced number of subjects rules out the use of 
standard statistical methods and one is limited to pre- 
senting individual test scores. This approach has been 
used successfully by the BBC on behalf of the MPEG 
work and has allowed preliminary assessments of 
codecs to be carried out without a significant cost penalty. 

It is worth noting that the ITU-R*** is currently debat- 
ing the whole issue of audio subjective testing and is 
specifically working on a new Recommendation for 
brief assessment methods. Close contact is being main- 
tained with both the ITU-R Task Group and the MPEG 
Audio Group on this matter. 



3. 1 994 MPEG TEST RESULTS 

In order to facilitate the comparisons which will fol- 
low, it is appropriate to replicate some of the results 
that were presented in the 1994 MPEG tests.^ These 
original tests were conducted at both the BBC Re- 
search Department and at Deutsche Telekom AG in 

* * * This task Is being carried out by Task Group 1 0-3. 



(R024) 



Berlin. As the further optimisation of codecs relates to 
the MPEG-2 Layers II and III and to test evaluations 
carried out at BBC Research and Development Depart- 
ment, it is the results for these codecs and the BBC test 
centre which are replicated here. 

Figs. 1 and 2 show the results obtained in March 1994 
for MPEG-2 Layer II at the two bit-rates of 320 kbit/s 
and 384 kbit/s for the full five channels of audio. Simi- 
larly, Figs. 3 and 4 show the results for MPEG-2 Layer 
III at 256 kbit/s and 320 kbit/s. The code used to iden- 
tify the programme excerpts is explained in Table 1 . 

Table 1: Code used for programme excerpts. 



Layer II at 320 kbit/s. 



Font 


Fountain - centre falling water, stereo piano 
and bird sounds 


Genz 


Genzmer - blocks, cymbal, organ 


Berl 


Voices and background 


Carn 


Carnival - commentary, marching band, bell, 
xylophone, drums, whistle 


Rock 


Rock concert - guitars, fiddle, clapping 


Harp 


Harpsichord 


Mane 


Mancini - orchestral strings, cymbal, drums, 
horn 


Pipe 


Pitch pipe 


Tria 


Triangle 


Indi 


Indiana Jones movie sound track - centre 
voice, orchestra, strings, brass 



The grading of each trial required the subject to give a 
grade according to the ITU-R 5-point impairment 
scalers for both stimuli 'A' and 'B' in the 'Ref / 'A' / 
'B' presentation." However, since one of these was the 
hidden reference, it was required that at least one of 
these grades should be '5', i.e. unimpaired. The 'diff- 
grade', which is plotted on the vertical axis, is the 
difference between the grade given for the coded 
stimulus and that given to the hidden reference. Thus, 
the higher the scores on these plots, the less impaired 
the coded signal appears to have been: a diff-grade of 
'0' is unimpaired, a diff-grade of '-4' is very annoy- 
ing, according to the ITU-R grading scale. 

As can be seen from Figs. 1 to 4, there were a signifi- 
cant number of test items, at that time, for which the 
subjects noted significant impairment. Specifically, 

* The ITU-R 5-point impairment scale. 
5.0 -p Imperceptible 

4.0 - - Perceptible but not annoying 
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Font Genz Berl Carn Rock Harp Mane Pipe Tria Indi 
programme excerpt (see Table 1) 

Fig. 1 - Verification: BBC, centre position. 

Diff-grade: grade [coded] — grade [reference] mean and 95% 

confidence interval, 23 subjects. 



Layer II at 384 kbit/s. 
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Font Genz Berl Carn Rock Harp Mane Pipe Tria Indi 
programme excerpt (see Table 1) 

Fig. 2 - Verification: BBC, centre position. 

Diff-grade: grade [coded] — grade [reference] mean and 95% 

confidence interval, 23 subjects. 



Layer III at 256 kbit/s. 
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1.0 -•- Very annoying 



Font Genz Berl Cam Rock Harp Mane Pipe Tria Indi 
programme excerpt (see Table 1) 

Fig. 3 - Verification: BBC, centre position. 

Diff-grade: grade [coded] - grade [reference] mean and 95% 

confidence interval, 23 subjects. 
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Layer III at 320 kbit/s. 

"1 r 




Font Genz Berl Cam Rock Harp Mane Pipe Tria Indi 
programme excerpt (see Table 1) 

Fig.4 - Verification: BBC, centre position. 

diff-grade: grade [coded] - grade [reference] mean and 95% 

confidence interval, 23 subjects. 



harpsichord, pitch pipe and triangle were found to 
cause tills generation of codecs most problems. On that 
basis, MPEG sought substantial Improvements In the 
audio coding quality for broadcast applications 



4. IMPROVEMENTS TO MPEG-2 LAYER III 

The first major Improvement In performance which the 
BBC was able to report to MPEG was for the Layer III 
codec (developed by Fraunhofer Institute fur Integrl- 
erte Schaltungen (FhG-IIS)), when operating In a 
simulcast mode at an aggregate bit-rate of 896 kblt/s. 
The basis of the simulcast mode Is that the total bit-rate 
Is divided Into 7 equal portions as follows: 5 x 128 kblt/s 
for the surround sound service and a further 2 x 128 kbll/s 
for the compatible stereo service, I.e. a total of 
7 X 128 kblt/s = 896 kblt/s. It has been suggested that 
this arrangement, may be able to deliver better quality 
without Increasing the overall bit-rate required, by 
avoiding compatibility matrlxlng problems.^ 

These tests were carried out In early 1995 and reported 
to the March 1995 meeting of MPEG.^ Fig. 5 shows 
the surround-sound performance of the new codec. For 
comparison Fig. 6 shows the March 1994 results for 
Layer III at 320 kblt/s for the same hsteners and the 
same test Items. As can be seen, a significant Improve- 
ment In quality had been achieved, partly due to better 
coding algorithms and partly due to the Increased bit- 
rate. 



5. IMPROVEMENTS TO MPEG-2 LAYER II 

Shortly after the March 1995 meeting of MPEG, 
CCETT were able to offer their Improved Layer II 
codec for evaluation. This was also set to operate at a 
total bit-rate of 896 kblt/s, but did not use the simulcast 
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Fig. 5 - FhG Layer III codec at 7 x 128 Icbit/s (March 1995). 
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Fig. 6 - Layer II codec at 320 kbit/s (March 1994). 
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Fig. 7 - CCETT Layer II codec at 896 kbit/s (July 1995). 
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approach. (An oddity in tiie MPEG-2 Standard^ allows 
simulcast for Layer III but not for eitiier Layer I or 
Layer IL) The codec was evaluated by the BBC during 
June 1995 and the results were reported to MPEG dur- 
ing the July 1995 meeting. It is worth noting that the 
CCETT codec used only a subset of the possible 
MPEG-2 coding features. Specifically the implementa- 
tion tested included compatibility matrixing (-3 dB 
centre, -3 dB surround) and channel switching, but ex- 
cluded prediction, phantom coding of the centre and 
dynamic crosstalk.^ 

The results of these tests are shown in Fig. 7. For com- 
parison Fig. 8 shows the March 1994 results for Layer 
II at 384 kbit/s for the same listeners and the same test 
items. As in the case of the layer III results, significant 
improvements in quality were being recorded. 



6. COMMENTS ON THE LAYER II AND III 
RESULTS 

Whilst the above results were very encouraging and 
showed a marked improvement in what was reported 
in March 1994, the process of optimisation and assess- 
ment was not, and is still not, as yet, complete. 

In the case of both Layer III and Layer II a significant 
increase in data-rate had been required in order to 
achieve the necessary improvement in quality for the 
all-important high-quality broadcast application. It was 
therefore decided, by all concerned (i.e. the MPEG 
partners), that it would be most advantageous if the bit- 
rate could be reduced below the 896 kbit/s level in 
order to allow the data to be used for other services. 

In both cases these were very brief tests. Better cer- 
tainty in the results would come from fuller 
evaluations according to the full rigour of BS-1116, 
particularly with a wider range of listeners. However, 
it was noted that as the more recent results reported 
above represented the judgement of the most critical 
hsteners (expert listeners), an overall judgement from 
a larger number of listeners might be expected to 
return a more favourable result: if anything, these re- 
sults tended towards the worst case that might be 
expected. 

Additionally, these tests record the performance after 
only one coding/decoding operation. In real use, 
broadcasters will need to employ codecs that can cope 
with multiple coding and decoding operations, i.e. cas- 
cadability. This must be assessed. (To be strictly 
correct, the Layer III coding evaluated here is just 
seven parallel mono-coding operations according to al- 
gorithms that have already been proven in a stereo- 
cascaded format.) 

Finally, to be complete, the quality of the compatible 



stereo coding should also be quantified. (Once more, 
for Layer III this should be a forgone conclusion, as it 
has already been assessed by the ITU-R.^") 

At the time of writing, further optimisation of the 
Layer II codec has taken place and further tests are be- 
ing conducted by the BBC and Deutsche Telekom AG, 
as part of the RACE dTTb project. The tests are exam- 
ining aspects of multichannel coding, cascadability and 
stereo performances. A later report will present the re- 
sults of this work. 



7. TESTS ON NON-BACKWARDS 
COMPATIBLE CODING OPTIONS 

One of the requirements for MPEG-2 codecs was that 
they were to be backwards compatible (BC) to MPEG-1. 
That is, some part of the coded audio data should be 
accessible to a standard MPEG-1 decoder to enable it 
to decode the signal to give a stereo service. Only new 
MPEG-2 decoders would use all the data and would 
thus provide a surround-sound service. 

However, some workers in this field, including the 
author, felt that there was likely to be a conflict be- 
tween data reduction and compatibility matrixing.^ A 
second line of enquiry was opened within MPEG as a 
direct result of the poor quality of the 1994 audio 
codecs and the perceived potential for this possible 
conflict. The enquiry was to enable non-backwards 
compatible (NBC) codecs to be developed. 

The basic approach to this development was to use a 
software based Reference Model methodology, and to 
pool all ideas on coding in order to optimise the build- 
ing blocks within the Reference Model. ^^ MPEG 
Audio's first NBC Reference model is shown in Fig. 9. 

During the period November 1994 to July 1995 vari- 
ous options for the first block (the time/frequency 
transformation block) were optimised, then exchanged 
between codec developers and evaluated. The aim was 
to select the best option for the time/frequency module 
which would subsequently be passed on to Reference 
Model 2, with a view to enable the other building 
blocks to be separately optimised later. 

It should be noted that at such an early stage in the de- 
velopment of a coding scenario, it is not appropriate to 
try and obtain grades of absolute quality of the codecs: 
it is sufficient to obtain relative grades or rank orders. 
Thus, many simplifications in the test arrangements 
and procedures can be accommodated to reduce and 
share the task of subjective assessments. 

In contrast to the BC codec tests, these NBC codec 
evaluations took place in a large number of test cen- 
tres, and all the results, from 68 test subjects on 22 
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Fig. 9 - NBC reference model version 1. 



transformation options, were combined. For these 
tests, which were mono only, a bit-rate of 64 kbit/s per 
channel was chosen. Additionally, to eliminate prob- 
lems due to test environment acoustics, iiigh quality 
headphones were used. The results have been reported 
fully in the 1995 NBC test report''^ and an example of 
these results, averaged over all programme items, is 
shown in Fig. 10. Table 2 (overleaf) gives details of the 
transforms' resolutions and other parameters. 

As can be seen, there is an approximate split of results, 
above and below a diff-grade of -1.5. With one excep- 
tion, all of those codecs scoring above -1.5 shared the 
highest resolution of 1024 spectral lines. Similar quality 

Module (see Table 2) 
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Fig. 10 - MPEG audio NBC time/frequency module tests. 
Headphones: All programme items. 



advantages were found for characteristics like 
switched-resolutions and prediction. 

The full results were discussed in detail at the July 
1995 meeting of MPEG and allowed the following op- 
tions to be selected for the time/frequency module of 
Reference Model 2: 

• MDCT with switched 1024-128 line resolu- 
tion, based on the proposal of FhG-IIS and 
AT&T. 

• Windowing, based on the proposal of Dolby. 

• Prediction, based on the proposal of Hannover 
University. 

At the time of writing, the NBC optimisation process is 
now in progress. The first phase of module optimisa- 
tion has been carried out; the latest test results will be 
presented to MPEG during its January 1996 meeting. 
These will be quantifying the benefits or otherwise of 
five different options for improving the Reference 
Model software. The current evaluations are still only 
based on mono channel coding and still use the simpli- 
fied test procedure. Once this phase of optimisation is 
complete, such things as stereo, multichannel coding 
and full BS-1116 assessments will follow. The target 
for the NBC Standard is March 1997, so a great deal of 
work for many MPEG members still hes ahead. 



8. CONCLUSIONS 

The 1994 MPEG report* recorded the first formal as- 
sessments on MPEG-2 (and two non-MPEG-2, NBC) 
audio codecs and showed that, for the most critical 
items, significant levels of impairment were recorded 
for all codecs. 
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Table 2: Code used for time/frequency modules. 


Module code 


t/f module 


A 


1024-128 MDCT 


B 


1024-128 MDCTH-prediction 


C 


512-128 MDCT 


D 


1024-128 MDCT 


E 


1024-128 MDCT 


F 


1024-128 MDCT 


G 


1024-128 MDCTH-predictiom-windows 


H 


1024-128 MDCT 


1 


1024-128 MDCT 


J 


68-96 MDCT 


K 


1024-128 MDCTH-prediction 


L 


64 MDCT 


M 


32 band uniform filterbank 


N 


48 band uniform filterbank 





48 band uniform filterbank 


P 


64 band uniform filterbank 


Q 


64 band uniform filterbank 


R 


128 band uniform filterbank 


S 


128 band uniform filterbank 


T 


1024 hybrid filterbank 


U 


1024 hybrid filterbank 


V 


256 hybrid filterbank 



Since that time, the BC codec developers have carried out 
further optimisation of their algorithms and further 
evaluations have been caixied out. These show a marked 
move towards the quality/bit-rate targets that must be 
achieved for the important broadcast applications. 

A second study has been working in parallel to the BC 
codec optimisation, namely the NBC mode. The first 
tests on the NBC time/frequency module are reported 
here briefly. These results have enabled choices to be 
made between the different resolutions for the time/ 
frequency transform block; such, that an improved 
Reference Model was able to go forward to the next 
rounds of codec optimisation. 

Optimisation within both the BC and NBC coding 
studies are continuing. 
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